Term Frequency Normalization via Pareto Distributions

نویسندگان

  • Gianni Amati
  • C. J. van Rijsbergen
چکیده

We exploit the Feller-Pareto characterization of the classical Pareto distribution to derive a law relating the probability of a given term frequency in a document and its the length. A similar law was derived by Mandelbrot. We exploit the paretian distribution to obtain a term frequency normalization to substitute for the actual term frequency in the probabilistic models of Information Retrieval recently introduced in TREC-10. Preliminary results show that the unique parameter of the framework can be eliminated in favour of the the term frequency normalization derived by the Paretian law.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A new look at q-exponential distributions via excess statistics

Q-exponential distributions play an important role in nonextensive statistics. They appear as the canonical distributions, i.e. the maximum generalized q-entropy distributions under mean constraint. Their relevance is also independently justified by their appearance in the theory of superstatistics introduced by Beck and Cohen. In this paper, we provide a third and independent rationale for the...

متن کامل

Shannon entropy in generalized order statistics from Pareto-type distributions

In this paper, we derive the exact analytical expressions for the Shannon entropy of generalized orderstatistics from Pareto-type and related distributions.

متن کامل

Bivariate Distributions via a Pareto Conditional Distribution and a Regression Function

Uniqueness of specification of a bivariate distribution by a Pareto conditional and a consistent regression function is investigated. New characterizations of the Mardia bivariate Pareto distribution and the bivariate Pareto conditionals distribution are obtained.

متن کامل

Information and Covariance Matrices for Multivariate Pareto (IV), Burr, and Related Distributions

Main result of this paper is to derive the exact analytical expressions of information and covariance matrix for multivariate Pareto, Burr and related distributions. These distributions arise as tractable parametric models in reliability, actuarial science, economics, finance and telecommunications. We showed that all the calculations can be obtained from one main moment multidimensional integr...

متن کامل

Perspective - From Gaussian to Paretian Thinking: Causes and Implications of Power Laws in Organizations

While normal distributions and related current quantitative methods are still relevant for some organizational research, the growing ubiquity of power laws signifies that Pareto rank/frequency distributions, fractals, and underlying scale-free theories are increasingly pervasive and valid characterizations of organizational dynamics. Where true, researchers ignoring power law effects risk drawi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002